ggplot2 and
leafletThis lab focuses on creating maps to display spatial data. It covers
both static maps (created using ggplot2) and interactive
maps (created using leaflet).
Directions (Please read before starting)
\(~\)
This lab will use both the ggplot2 and
leaflet packages. It will also require the
dplyr package for merging and joining, the
maps package, which contains the spatial information needed
to create a variety commonly used maps, and maptools, a
package used to process and read certain spatial data files.
# load the following packages
# install.packages("leaflet")
# install.packages("maps")
# install.packages("maptools")
library(leaflet)
library(ggplot2)
library(dplyr)
library(maps)
library(maptools)
\(~\)
There are three main geometric elements used to represent spatial data:
\(~\)
A choropleth map uses colored polygons to represent values corresponding to geographic units.
The first step in creating a choropleth map is creating the polygons that define each geographic unit:
## Get state polygons from the "maps" package
state_polys <- map_data("state")
## Graph these polygons
ggplot(data = state_polys, aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "lightblue")
In state_polys each “region” has a series of ordered
vertices, each with its own longitude and latitude. The “group” and
“order” of these vertices is used to draw the edges of that region’s
polygon.
To make a choropleth map, we must color each polygon according to a variable. This requires joining new data with the data frame that contains the polygons:
## USArrests data from the "datasets" package
data("USArrests")
## Create a data frame to join
state_values <- data.frame(State = tolower(rownames(USArrests)),
MurderRate = USArrests$Murder)
## Add MurderRate to the state polygons
full_data <- inner_join(x = state_polys, y = state_values, by = c("region" = "State"))
## Color the polygons using a fill variable
ggplot(data = full_data, aes(x = long, y = lat, group = group, fill = MurderRate)) +
geom_polygon(color = "black")
\(~\)
Because both polygons and lines/points use the “x” and “y” aesthetics, it is necessary to use local specification if you’d like to create a map involving multiple types of spatial features:
data("us.cities") ## Data on 1005 US cities
ggplot() +
geom_polygon(data = full_data, aes(x = long, y = lat, group = group, fill = MurderRate), color = "black") +
geom_point(data = us.cities, aes(x = long, y = lat), size = 0.1, color = "red")
The example above demonstrates this by adding various US cities to
the murder rate choropleth map we previously created (notice how
different data arguments are used within geom_polygon() and
geom_point()).
\(~\)
For this practice question you should use the “state” and “county”
polygons found in the maps package, as well as the
us.cities data frame:
state_polys <- map_data("state")
county_polys <- map_data("county")
data("us.cities")
Question #1: Filter the us.cities data
to include only state capitals (capital == 2) in the
contiguous United States (exclude cities in AK and HI). Then create a
map displaying state polygons (defined by black borders with
fill = NA, size = 0.3 and
alpha = 0.3) and counties (defined by blue borders with
fill = "white", size = 0.1 and
alpha = 0.5) that also includes state capitals as points
(with color = "yellow"). To exploit layering, you should
add the county polygons first and the cities last. Your end result
should be similar to the map shown below.
\(~\)
The polygon boundaries given by map_data() are easy to
understand, but they lack the flexibility provided by more complex
spatial storage structures. In most applications, geographic information
is stored in a shapefile, a complex format that must contain
the following components:
To practice working with a shapefile, you should download and extract this folder, which was originally obtained from Natural Earth Data). You should notice it contains .shp, .shx, and .dbf files (along with a few others).
To begin, we’ll load the shapefile using the
readShapeSpatial() function within the
maptools package:
world <- readShapeSpatial("C:/Users/millerry/Downloads/world_shape/ne_50m_admin_0_countries") ## You will need to change this file path
Notice we used a local path to the shapefiles, and we omitted the .shp/.shx/.dbf file extensions.
Using str() to learn about the structure of
world, we see it contains 5 “slots”, which are components
we can reference using @ (similar to $ for
data.frame objects):
str(world, max.level = 2) ## Only print the first 2 levels
## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
## ..@ data :'data.frame': 242 obs. of 168 variables:
## .. ..- attr(*, "data_types")= chr [1:168] "C" "N" "N" "C" ...
## ..@ polygons :List of 242
## ..@ plotOrder : int [1:242] 240 76 203 17 196 211 226 182 134 232 ...
## ..@ bbox : num [1:2, 1:2] -180 -90 180 83.6
## .. ..- attr(*, "dimnames")=List of 2
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
## ..$ comment: chr "FALSE"
Unfortunately, ggplot cannot work directly with
shapefiles, so we must use the fortify() function to
convert them to a workable format:
world_polys <- fortify(world) ## Converts the shapefile into a data frame of polygons
ggplot() + geom_polygon(data = world_polys, aes(x = long, y = lat, group = group), fill = "white", color = "blue")
Note: fortify() has been replaced by the
tidy() function in the broom package - but you
can still use it (at least for now).
Neither fortify() nor tidy() will attach
anything from your shapefile’s “data” slot to the data frame it creates.
To do this, you must create your own “id” variable that matches the
ordering of the “id” that was automatically generated:
# Create a new column called "id"
world_data <- mutate(world@data, id=as.character(0:241))
# Join the tidied shapefile with the data from our original shapefile.
world_polys_data <- left_join(world_polys, world_data, by = "id")
# Test it out by seeing if India, Brazil and Canada appear w/ different fill colors
ggplot() + geom_polygon(data = world_polys_data,
aes(x = long, y = lat, group = group,
fill = ifelse(NAME_EN %in% c("India", "Brazil", "Canada"),
"yes", "no")),
color = "black") + guides(fill = FALSE)
Question #2: The variable “GDP_MD” (already present
in the merged data frame created above) records the gross domestic
product (GDP) in millions of US dollars for the year 2019. For this
question, create a choropleth that colors each country from white to
green according to its GDP. Hint: you should use the
scale_fill_continuous() function.
\(~\)
Another popular tool used to create maps is leaflet, an open-source JavaScript library used by many professional organizations.
The base layer of a leaflet map is a set of map tiles.
Additional layers, such as polygons, lines, or point markers can be
added via piping:
## Just the base layer
leaflet() %>%
addTiles()
## Base layer plus polygons from the world shapefile
leaflet(world) %>%
addTiles() %>% addPolygons()
Note that leaflet() uses shapefiles directly, which is
different from ggplot.
\(~\)
By default, addTiles() will generate a base map using OpenStreetMap. This works
great for the vast majority of mapping applications, but if you’d like
something different Leaflet offers a list of alternative tile sets in
the providers object. These can be used via the
addProviderTiles() function:
## List of provider tile sets
head(providers)
## $OpenStreetMap
## [1] "OpenStreetMap"
##
## $OpenStreetMap.Mapnik
## [1] "OpenStreetMap.Mapnik"
##
## $OpenStreetMap.DE
## [1] "OpenStreetMap.DE"
##
## $OpenStreetMap.CH
## [1] "OpenStreetMap.CH"
##
## $OpenStreetMap.France
## [1] "OpenStreetMap.France"
##
## $OpenStreetMap.HOT
## [1] "OpenStreetMap.HOT"
## Example ()
leaflet() %>%
addProviderTiles(providers$HERE.satelliteDay)
\(~\)
In leaflet, polygon attributes are controlled by the
arguments of addPolygons(). Of particular importance is the
fillColor argument, which requires a vector of colors with
a length equal to the number of polygons.
The code below demonstrates how to use the fillColor to
color each country by the base-ten log of “POP_EST” using the “magma”
color palate:
## Vector of values to color by
col_vals <- log10(world@data$POP_EST)
col_vals[is.infinite(col_vals)] <- NA ## Replace infinities with NA
## Create a color palette (using the "magma" color scheme)
pal <- colorNumeric("magma", domain = col_vals)
leaflet(world) %>% addTiles() %>%
addPolygons(fillColor = pal(col_vals), weight = 1) %>%
addLegend(pal = pal, values = col_vals,
title = "Estimated population (log10)", position = "bottomleft", na.label = "Missing")
Leaflet will not generate a color legend by default, so in the
example above we created our own using the addLegend()
function.
Question #3: Use Leaflet to create a choropleth map displaying the variable “GDP_MD” using the “RdYlBu” color palette. Your map should include an appropriately formatted legend and use the “Stamen.Toner” provider tile set.
There are two ways to add interactive information to a Leaflet map:
Similar to plotly, Leaflet allows the use of HTML
commands:
## Set up the labels and popup
myLabels <- paste("<strong>", world@data$NAME_EN, "</strong>", "<br/>",
"Population:", world@data$POP_EST)
myPopups <- paste("Income Group", world@data$INCOME_GRP)
## Add to map (using "highlight" argument)
leaflet(world) %>% addTiles() %>%
addPolygons(fillColor = pal(col_vals),
weight = 1,
highlight = highlightOptions(
weight = 3,
color = "grey",
fillOpacity = 0.7,
bringToFront = TRUE),
label = lapply(myLabels, htmltools::HTML),
popup = myPopups) %>%
addLegend(pal = pal, values = col_vals,
title = "Estimated population (log10)", position = "bottomleft", na.label = "Missing")
The highlight argument dictates how a selected polygon
should be highlighted (in this case it will be brought to the front with
a grey border and an opaque fill)
Additionally, for labels/popup text with HTML commands to render
properly, the HTML() function in the htmltools
package needs to be applied to each element of the vector containing the
labels (achieved by the lapply() function).
\(~\)
Often we’d like to combine multiple data sources when creating a map.
For example, we might want to use the “Happy Planet” data to create an
interactive choropleth map. To accomplish this, we must pay special
attention to how the external data is merged with the @data
component of our shapefile.
hp <- read.csv("https://remiller1450.github.io/data/HappyPlanet.csv") ## contains 143 countries
### find any non-matches using anti_join
non_matches <- anti_join(x = hp, y = world@data, by = c("Country" = "NAME_EN"))
non_matches$Country
## [1] "Burma" "China"
## [3] "Congo" "Congo, Dem. Rep. of the"
## [5] "Korea" "Macedonia"
Notice there are 6 countries in the Happy Planet data whose names do
not match any of the polygons in world; however, this is
not because those countries do not exist within world
(China as an example). We will rectify the issue by modifying the
country names inside the Happy Planet data set:
## Repair names (manually)
hp$Country[hp$Country == "China"] <- "People's Republic of China"
hp$Country[hp$Country == "Congo, Dem. Rep. of the"] <- "Democratic Republic of the Congo"
hp$Country[hp$Country == "Congo"] <- "Republic of the Congo"
hp$Country[hp$Country == "Korea"] <- "South Korea"
hp$Country[hp$Country == "Burma"] <- "Myanmar"
hp$Country[hp$Country == "Macedonia"] <- "North Macedonia"
### check for non-matches (none)
anti_join(x = hp, y = world@data, by = c("Country" = "NAME_EN"))
## [1] Country Region Happiness LifeExpectancy Footprint
## [6] HLY HPI HPIRank GDPperCapita HDI
## [11] Population
## <0 rows> (or 0-length row.names)
We’re now ready to join the Happy Planet data with our shapefile and create a map:
world <- readShapeSpatial("C:/Users/millerry/Downloads/world_shape/ne_50m_admin_0_countries") ## Reload before overwriting
world@data <- left_join(x = world@data, y = hp, by = c("NAME_EN" = "Country"))
## Vector of values to color by
col_vals <- world@data$Happiness
col_vals[is.infinite(col_vals)] <- NA ## Replace infinities with NA
pal <- colorNumeric("viridis", domain = col_vals)
## Add to map (using "highlight" argument)
leaflet(world) %>% addTiles() %>%
addPolygons(fillColor = pal(col_vals),
weight = 1) %>%
addLegend(pal = pal, values = col_vals,
title = "Happiness", position = "bottomleft")
Question 4: The world.cities data frame
contained within the maps package includes the locations
and populations of more than 43,000 world cities. The code below creates
a filtered version of these data that contains only capitals with a
population greater than 500,000. For this question, join the information
in this filtered data set with the @data component of the
world shapefile. To do so, you will need to manually repair
the names of 7 countries. Next, use the information from
world.cities to color all countries with a capital city
population greater than 500,000 in red.
data("world.cities")
caps <- subset(world.cities, capital == 1 & pop > 500000)
Shown below is an example of this map (yours can use different colors):
\(~\)
The addMarkers() function is the most common way to
display geographic points on a leaflet map. At minimum, a marker only
needs a latitude and longitude. However, markers are most effective when
combined with labels and/or popups:
data("us.cities")
leaflet(data = filter(us.cities, pop > 500000)) %>%
addTiles() %>%
addMarkers(lng = ~long, lat = ~lat,
label = ~paste(name), ## Hover on a marker to see the city name
popup = ~paste("Population:", pop)) ## Click on a marker to see the population
In this example, we filtered us.cities to only include
cities with large populations so that we could avoid plotting too many
markers in a small geographic area.
However, we could accommodate all 1005 cities by using cluster markers:
leaflet(data = us.cities) %>%
addTiles() %>%
addMarkers(lng = ~long, lat = ~lat,
label = ~paste(name),
popup = ~paste("Population:", pop),
clusterOptions = markerClusterOptions())
Or by using circle markers with a low opacity to make over-plotting less of an issue:
leaflet(data = us.cities) %>%
addTiles() %>%
addCircleMarkers(lng = ~long, lat = ~lat,
label = ~paste(name),
popup = ~paste("Population:", pop),
fillOpacity = 0.20,
color = "yellow",
stroke = FALSE)
In the example above, the argument fillOpacity = 0.2
makes each marker 80% transparent, while stroke = FALSE
removes the solid outline around each marker.
Question #5: Using the world.cities
data, create a map where each city is depicted by a circle marker that
displays the city’s name and country when you hover over it, and the
city’s population when you click on it. Additionally, your map should
color capital cities red and non-capitals yellow and you should use
clusterOptions = markerClusterOptions() to display clusters
when zoomed out.
Shown below is an example of this map (yours should appear similar to it):